{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PAI-tensorflow DeepFM\n",
    "\n",
    "当前样例为PAI TF 实现的DeepFM [1].本样例代码fork自[Github](https://github.com/ChenglongChen/tensorflow-DeepFM)并进行了一定的改造.\n",
    "\n",
    "\n",
    "# 使用说明\n",
    "## 样例数据\n",
    "本实现仅接受kv类型数据，所有输入特征必须以kv类型表达，样例数据举例\n",
    "\n",
    "|label|feature1|feature2|multi_tags1|multi_tags2|\n",
    "|----|----|----|----|----|\n",
    "|1|1:0.1|3:0.2|3:0.5,4:0.3|7:0.1|\n",
    "|1|1:0.1|3:0.2,110:0.5,170:0.2|4:0.3|7:0.1|\n",
    "|1|1:0.1|3:0.2|3:0.5,4:0.3|7:0.1|\n",
    "|0|1:0.1|6:0.2,130:0.5|4:0.3|7:0.1|\n",
    "|1|1:0.1|3:0.2|3:0.5,4:0.3|7:0.1|\n",
    "|0|1:0.1|3:0.5|3:0.5,10:0.3|7:0.1,12:0.3|\n",
    "|1|1:0.1|7:0.2|:0.3,4:0.3|7:0.1|\n",
    "|0|1:0.1|3:0.2|3:0.5,9:0.3|7:0.1|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据说明\n",
    "输入kv数据分为两种类型，模型会根据key值确定具体embedding，value值可为整数或实数权重，为了支持实际应用中多标签对应同一个特征的情况，本实现还支持multi tags embedding，即会将多个tag的embedding合并为一个embedding输入模型，增强泛化\n",
    "\n",
    "- 普通kv数据\n",
    "   \n",
    "   `如上例中feature1,feature2中的数据，对于每个kv对会对应一个embedding输入，同一列可以存在多个kv，允许key不出现，但需要注意一点,普通kv数据为稀疏表达，但模型仅接受稠密输入，故存在kv map，将key映射到不同的模型槽位`\n",
    "   \n",
    "    ![avatar](kv_map.png)\n",
    "    \n",
    "    `上述右侧kv map参数为本实现的一个输入参数，可以根据数据情况进行调整`\n",
    "    `普通kv数据key共享一个key空间，多列输入和合并至一列并没有区别，key都受kv map影响，所有key共享一个最大key size限制`\n",
    "    \n",
    "    ***注意***：一条样本内同一个kv map区间内的key只应出现一次，如果不互斥训练行为会发生异常\n",
    "    \n",
    "- multi tags数据\n",
    "    \n",
    "    `如上例中的multi_tags1,multi_tags2中的数据，与普通kv数据不同的是，该类型一列数据仅代表一个feature，当拥有多个multi tags feature时，要存放至多个列中，每列内的key空间相互独立，但，为便于参数配置所有列的最大key size相同，本类型key不受kv map限制`\n",
    "    \n",
    "    ***注意***：当前多个tags的embedding仅支持sum方式合并\n",
    "## 支持平台\n",
    "\n",
    "|运行平台|OSS读取|Maxcompute读取|本地读取|\n",
    "|----|----|----|----|\n",
    "|DSW|否|否|是|\n",
    "|open source|否|否|是|\n",
    "|PAI studio(Maxcompute)|是|是|否|\n",
    "\n",
    "                                注意 Maxcompute平台运行参数见后续章节"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型参数说明\n",
    "\n",
    "|参数|说明|备注|\n",
    "|------|------|-----|\n",
    "|input_name |输入数据||\n",
    "|feature_cols |普通kv特征列|可多个，kv格式|\n",
    "|multi_tags_cols|multi tags特征列|0至多个，kv 格式|\n",
    "|kv_map |kv映射|为python数据格式，例如[[[1],[1,100]],[[2],[101,30000]]]|\n",
    "|feature_max_size|普通特征最大key||\n",
    "|multi_tags_max_size| multi tags最大key|所有multi tags列共享该值|\n",
    "|label_col_name|标签列列名|0或1|\n",
    "|kvs_delimiter|kv分隔符|默认逗号|\n",
    "|kv_delimiter|key与valu分隔符|默认为冒号|\n",
    "|checkpointDir|checkpoint oss地址|必须经过ARN授权|\n",
    "|output_name|输出数据|predict模式有效|\n",
    "|mode|模式|train or predict|\n",
    "|use_fm|是否使用FM|True or False|\n",
    "|use_deep|是否使用Deep|同上|\n",
    "|sync_type|同步模式|\"async\" or \"sync\"|\n",
    "|embedding_size|embedding 大小|默认为:8|\n",
    "|num_steps|最大迭代次数|默认为:100*1000*1000|\n",
    "|epoch|最大轮数|默认1|\n",
    "|dropout_fm|fm侧droptout参数|为pyhon数据格式,默认为[1.0, 1.0],必须为2个值参数|\n",
    "|deep_layers|deep侧神经网络设置|为python数据个,默认为[32, 32], 可设置更多层数和每层神经元数|\n",
    "|dropout_deep|deep侧dropout参数|为python格式,默认为[0.5, 0.5, 0.5], 注意:元素个数必须为len(deep_layers) + 1|\n",
    "|batch_size|batch大小|默认为:128|\n",
    "|learning_rate|学习率|默认为:0.001|\n",
    "|optimizer_type|优化器类型|默认为:adam,可选值为adam,adagrad,gd,momentum|\n",
    "|batch_norm|是否开启batch norm|True or False|\n",
    "|batch_norm_decay|batch norm衰减|默认为: 0.995|\n",
    "|l2_reg|L2正则参数|默认为:0.01|\n",
    "|log_verbose|详细log 选项|True or False|\n",
    "|adam_beta1|adam 优化器参数|默认0.9| \n",
    "|adam_beta2|adam 优化器参数|默认0.99| \n",
    "|adam_epsilon|adam 优化器参数|默认1e-8| \n",
    "|adagrad_initial_accumulator_value|adagrad 优化器参数|默认1e-8| \n",
    "|momentum|momentum 优化器参数|默认0.95| \n",
    "|random_seed|随机数种子|默认为:123456,注意在分布式异步模式下， 构图为必然结果，计算结果仍随机|\n",
    "\n",
    "***注意***: `以上参数既试用于调用DeepFM接口，也适用于PAI stuido运行参数`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 文件说明\n",
    "\n",
    "  本实现包含两个文件，`DeepFM.py`为模型实现文件，`entry.py`为启动main文件（已适配PAI studio）\n",
    "  \n",
    "## 初始化和训练模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import absolute_import\n",
    "from __future__ import division\n",
    "from __future__ import print_function\n",
    "\n",
    "from DeepFM import *\n",
    "tf.logging.set_verbosity(tf.logging.INFO)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 构造模型所需参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "kv_map = [[1],[1,300],[2],[300,700],[3],[700,1000],[4],[1000,1500]]\n",
    "# 对于输入于不同平台输入格式略有差别，本示例以dsw本地文件为样例\n",
    "input_name = \"./train.csv\" \n",
    "# DeepFM 参数\n",
    "dfm_params = {\n",
    "        \"use_fm\": True,\n",
    "        \"use_deep\": True,\n",
    "        \"embedding_size\": 8,\n",
    "        \"dropout_fm\": [0.8,0.8],\n",
    "        \"deep_layers\": [32,32],\n",
    "        \"dropout_deep\": [0.5,0.5,0.5],\n",
    "        \"deep_layers_activation\": tf.nn.relu,\n",
    "        \"batch_size\": 128, \n",
    "        \"learning_rate\": 0.001,\n",
    "        \"optimizer_type\": \"adam\",\n",
    "        \"batch_norm\": False,\n",
    "        \"batch_norm_decay\": 0.995,\n",
    "        \"l2_reg\": 0.0,\n",
    "        \"epoch\": 1,\n",
    "        \"num_steps\": 10, \n",
    "        \"verbose\": True,\n",
    "        \"random_seed\": 1,\n",
    "        \"server\": None,\n",
    "        \"cluster\": None,\n",
    "        \"task_index\": 0, \n",
    "        \"worker_num\": 1,\n",
    "        \"input_name\": input_name,\n",
    "        \"feature_cols\": [\"feature1\"],\n",
    "        \"kv_map\":kv_map,\n",
    "        \"feature_max_size\": 3000,\n",
    "        \"label_col_name\": \"label\",\n",
    "        \"kvs_delimiter\": \"|\",\n",
    "        \"kv_delimiter\": \":\",\n",
    "        \"sync_type\": \"async\",\n",
    "        \"checkpoint_dir\": \"./checkpoint\",\n",
    "        \"mode\": \"train\",\n",
    "        \"output_name\": \"./out\",\n",
    "        \"multi_tags_col_name\": [\"multi_tags1\", \"multi_tags2\"],\n",
    "        \"multi_tags_max_size\":50\n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 构造模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "kv map range start: 1  end: 300  solt: 0\n",
      "kv map range start: 300  end: 700  solt: 1\n",
      "kv map range start: 700  end: 1000  solt: 2\n",
      "kv map range start: 1000  end: 1500  solt: 3\n",
      "select columns : label,feature1,multi_tags1,multi_tags2\n",
      "[2020-03-12 13:58:57,227] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/util/auto_strategy_utils.py:108] Disable Auto Strategy.\n",
      "[2020-03-12 13:58:57,379] [WARNING] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py:305] From /home/admin/.local/lib/python2.7/site-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n",
      "[2020-03-12 13:58:59,198] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/ops/summary_op_util.py:77] Summary name avg loss is illegal; using avg_loss instead.\n",
      "model params number: 29671\n",
      "model total size: 118684\n",
      "build graph done\n"
     ]
    }
   ],
   "source": [
    "deepfm = DeepFM(**dfm_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2020-03-12 13:59:44,323] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py:529] Create CheckpointSaverHook.\n",
      "[2020-03-12 13:59:44,325] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py:544] Init incremental saver , incremental_save:False, incremental_path:./checkpoint/.incremental_checkpoint/incremental_model.ckpt\n",
      "[2020-03-12 13:59:44,326] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py:234] Graph was finalized.\n",
      "[2020-03-12 13:59:45,396] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py:507] Running local_init_op.\n",
      "[2020-03-12 13:59:45,418] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py:509] Done running local_init_op.\n",
      "[2020-03-12 13:59:50,275] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py:616] Saving checkpoints for 0 into ./checkpoint/model.ckpt.\n",
      "[2020-03-12 13:59:52,422] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py:578] Create incremental timer, incremental_save:False, incremental_save_secs:None\n",
      "[2020-03-12 13:59:52,919] [INFO] [11055#MainThread] [DeepFM.py:686] global steps: 1, loss: 0.70488054, positive sample num: 67\n",
      "[2020-03-12 13:59:53,333] [INFO] [11055#MainThread] [DeepFM.py:686] global steps: 2, loss: 0.719328, positive sample num: 60\n",
      "[2020-03-12 13:59:53,358] [INFO] [11055#MainThread] [DeepFM.py:686] global steps: 3, loss: 0.6891791, positive sample num: 52\n",
      "[2020-03-12 13:59:53,381] [INFO] [11055#MainThread] [DeepFM.py:686] global steps: 4, loss: 0.6929001, positive sample num: 60\n",
      "[2020-03-12 13:59:53,405] [INFO] [11055#MainThread] [DeepFM.py:686] global steps: 5, loss: 0.7147689, positive sample num: 68\n",
      "[2020-03-12 13:59:53,428] [INFO] [11055#MainThread] [DeepFM.py:686] global steps: 6, loss: 0.7097136, positive sample num: 69\n",
      "[2020-03-12 13:59:53,451] [INFO] [11055#MainThread] [DeepFM.py:686] global steps: 7, loss: 0.7056052, positive sample num: 69\n",
      "[2020-03-12 13:59:53,473] [INFO] [11055#MainThread] [DeepFM.py:686] global steps: 8, loss: 0.70204127, positive sample num: 62\n",
      "[2020-03-12 13:59:53,495] [INFO] [11055#MainThread] [DeepFM.py:686] global steps: 9, loss: 0.69321454, positive sample num: 62\n",
      "[2020-03-12 13:59:53,518] [INFO] [11055#MainThread] [DeepFM.py:686] global steps: 10, loss: 0.72571313, positive sample num: 74\n",
      "[2020-03-12 13:59:53,522] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py:616] Saving checkpoints for 10 into ./checkpoint/model.ckpt.\n",
      "[2020-03-12 13:59:55,574] [INFO] [11055#MainThread] [DeepFM.py:690] end of train/evaluate\n",
      "kv map range start: 1  end: 300  solt: 0\n",
      "kv map range start: 300  end: 700  solt: 1\n",
      "kv map range start: 700  end: 1000  solt: 2\n",
      "kv map range start: 1000  end: 1500  solt: 3\n",
      "[2020-03-12 13:59:56,663] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1675] Restoring parameters from ./checkpoint/model.ckpt-10\n",
      "[2020-03-12 13:59:56,967] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/saved_model/builder_impl.py:518] Assets added to graph.\n",
      "[2020-03-12 13:59:56,970] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/saved_model/builder_impl.py:121] No assets to write.\n",
      "[2020-03-12 13:59:58,042] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/saved_model/builder_impl.py:472] SavedModel written to: ./checkpoint/saved_model/20200312-135956/saved_model.pb\n",
      "[2020-03-12 13:59:58,044] [INFO] [11055#MainThread] [DeepFM.py:773] end of export_saved_model\n"
     ]
    }
   ],
   "source": [
    "deepfm.train_and_evaluate()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用模型预测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "kv map range start: 1  end: 300  solt: 0\n",
      "kv map range start: 300  end: 700  solt: 1\n",
      "kv map range start: 700  end: 1000  solt: 2\n",
      "kv map range start: 1000  end: 1500  solt: 3\n",
      "select columns : label,feature1,multi_tags1,multi_tags2\n",
      "build graph done\n",
      "[2020-03-12 14:02:56,123] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py:529] Create CheckpointSaverHook.\n",
      "[2020-03-12 14:02:56,125] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py:544] Init incremental saver , incremental_save:False, incremental_path:./checkpoint/.incremental_checkpoint/incremental_model.ckpt\n",
      "[2020-03-12 14:02:56,270] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py:234] Graph was finalized.\n",
      "[2020-03-12 14:02:56,545] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1675] Restoring parameters from ./checkpoint/model.ckpt-10\n",
      "[2020-03-12 14:02:56,590] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py:507] Running local_init_op.\n",
      "[2020-03-12 14:02:56,604] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py:509] Done running local_init_op.\n",
      "[2020-03-12 14:02:58,855] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py:616] Saving checkpoints for 10 into ./checkpoint/model.ckpt.\n",
      "[2020-03-12 14:03:00,138] [INFO] [11055#MainThread] [../../.local/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py:578] Create incremental timer, incremental_save:False, incremental_save_secs:None\n",
      "[2020-03-12 14:03:00,347] [INFO] [11055#MainThread] [DeepFM.py:721] end of predict\n"
     ]
    }
   ],
   "source": [
    "kv_map = [[1],[1,300],[2],[300,700],[3],[700,1000],[4],[1000,1500]]\n",
    "# 模型预测文件 \n",
    "input_name = \"./test.csv\" \n",
    "output_name = \"./result.csv\"\n",
    "# DeepFM 参数,注意model参数调整为predict\n",
    "dfm_params = {\n",
    "        \"use_fm\": True,\n",
    "        \"use_deep\": True,\n",
    "        \"embedding_size\": 8,\n",
    "        \"dropout_fm\": [0.8,0.8],\n",
    "        \"deep_layers\": [32,32],\n",
    "        \"dropout_deep\": [0.5,0.5,0.5],\n",
    "        \"deep_layers_activation\": tf.nn.relu,\n",
    "        \"batch_size\": 128, \n",
    "        \"learning_rate\": 0.001,\n",
    "        \"optimizer_type\": \"adam\",\n",
    "        \"batch_norm\": False,\n",
    "        \"batch_norm_decay\": 0.995,\n",
    "        \"l2_reg\": 0.0,\n",
    "        \"epoch\": 1,\n",
    "        \"num_steps\": 10, \n",
    "        \"verbose\": True,\n",
    "        \"random_seed\": 1,\n",
    "        \"server\": None,\n",
    "        \"cluster\": None,\n",
    "        \"task_index\": 0, \n",
    "        \"worker_num\": 1,\n",
    "        \"input_name\": input_name,\n",
    "        \"feature_cols\": [\"feature1\"],\n",
    "        \"kv_map\":kv_map,\n",
    "        \"feature_max_size\": 3000,\n",
    "        \"label_col_name\": \"label\",\n",
    "        \"kvs_delimiter\": \"|\",\n",
    "        \"kv_delimiter\": \":\",\n",
    "        \"sync_type\": \"async\",\n",
    "        \"checkpoint_dir\": \"./checkpoint\",\n",
    "        \"mode\": \"predict\",\n",
    "        \"output_name\": output_name,\n",
    "        \"multi_tags_col_name\": [\"multi_tags1\", \"multi_tags2\"],\n",
    "        \"multi_tags_max_size\":50\n",
    "    }\n",
    "# 会根据checkpoint dir参数自动拉起模型进行预测\n",
    "deepfm = DeepFM(**dfm_params) \n",
    "deepfm.predict()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.476935</td>\n",
       "      <td>171:0.160769187967|558:0.762985057103|987:0.46...</td>\n",
       "      <td>7:0.551860799509|2:0.516122332872</td>\n",
       "      <td>3:0.578531033382|10:0.912359976594</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.502716</td>\n",
       "      <td>125:0.631063165963|424:0.610401693013|982:0.30...</td>\n",
       "      <td>6:0.850742992565|5:0.452926581219|4:0.74618124...</td>\n",
       "      <td>5:0.733476495488|4:0.746721584349</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.455586</td>\n",
       "      <td>226:0.590334677243|601:0.353207570678|972:0.87...</td>\n",
       "      <td>1:0.899031815365|3:0.287916015904</td>\n",
       "      <td>6:0.792869990048|1:0.806770056389</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.451006</td>\n",
       "      <td>187:0.167603031484|338:0.0603367158994|790:0.5...</td>\n",
       "      <td>7:0.325278910648</td>\n",
       "      <td>14:0.214951601542</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.483660</td>\n",
       "      <td>239:0.749947232634|630:0.832378526229|847:0.49...</td>\n",
       "      <td>9:0.501711706914|5:0.575738108738</td>\n",
       "      <td>3:0.466092055001|13:0.709491003408</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.479454</td>\n",
       "      <td>203:0.222651807156|481:0.567848503189|731:0.77...</td>\n",
       "      <td>5:0.285600906477</td>\n",
       "      <td>11:0.493049264151|2:0.158095697727|11:0.754516...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.431707</td>\n",
       "      <td>59:0.855379126162|513:0.141471919663|783:0.409...</td>\n",
       "      <td>6:0.817890670226</td>\n",
       "      <td>4:0.364886715962|14:0.784435436416|10:0.043269...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.462653</td>\n",
       "      <td>20:0.131093229566|677:0.560840054375|908:0.737...</td>\n",
       "      <td>4:0.709533662447</td>\n",
       "      <td>10:0.974380757476</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.487114</td>\n",
       "      <td>172:0.766779876387|584:0.66569748063|989:0.355...</td>\n",
       "      <td>6:0.114119799904</td>\n",
       "      <td>2:0.508507829738</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.469684</td>\n",
       "      <td>37:0.216288069064|361:0.270378044031|813:0.376...</td>\n",
       "      <td>5:0.15777133173|4:0.139784393334</td>\n",
       "      <td>6:0.494135082626</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.504351</td>\n",
       "      <td>105:0.926029460978|300:0.393425065771|743:0.30...</td>\n",
       "      <td>9:0.669337345955|7:0.441819620127</td>\n",
       "      <td>7:0.485383148462|3:0.125428390011|10:0.2036683...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>0.462543</td>\n",
       "      <td>132:0.212923272863|512:0.929013588064|767:0.14...</td>\n",
       "      <td>6:0.110938869816|8:0.54415660918</td>\n",
       "      <td>13:0.668113977213</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>0.484072</td>\n",
       "      <td>219:0.881813730397|389:0.561590207114|941:0.82...</td>\n",
       "      <td>3:0.326593454924|6:0.985041938454|8:0.85089269...</td>\n",
       "      <td>2:0.513292357115</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>0.497101</td>\n",
       "      <td>236:0.431104177581|457:0.537189291023|883:0.73...</td>\n",
       "      <td>2:0.860980822171|4:0.835194645965|7:0.94994479...</td>\n",
       "      <td>5:0.305789281449|1:0.0754994554699|12:0.892406...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>0.494491</td>\n",
       "      <td>18:0.423662145993|432:0.0376770741931|804:0.52...</td>\n",
       "      <td>4:0.612642228162</td>\n",
       "      <td>12:0.474166319997|12:0.234704205978</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>0.483268</td>\n",
       "      <td>226:0.126308385983|619:0.172374653605|948:0.70...</td>\n",
       "      <td>1:0.736998750902|5:0.865735323506</td>\n",
       "      <td>13:0.538441443708|12:0.898462163417|11:0.54271...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>0.498694</td>\n",
       "      <td>95:0.00745487544264|474:0.487908071243|932:0.2...</td>\n",
       "      <td>1:0.562970994328</td>\n",
       "      <td>5:0.190344207776|13:0.485808245185</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>0.482705</td>\n",
       "      <td>182:0.0485782591081|568:0.322526924582|863:0.1...</td>\n",
       "      <td>1:0.203530688302|6:0.708283835359|5:0.11347919...</td>\n",
       "      <td>12:0.780980740861</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>0.472208</td>\n",
       "      <td>155:0.22277532867|497:0.842596723995|805:0.524...</td>\n",
       "      <td>9:0.46877523338|6:0.578300316531|7:0.69566161426</td>\n",
       "      <td>12:0.0187166419041|7:0.892443192656</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>0.431964</td>\n",
       "      <td>12:0.274281719682|302:0.713962078485|972:0.671...</td>\n",
       "      <td>1:0.64778739726|9:0.923251278452</td>\n",
       "      <td>7:0.964178415721</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>0.414114</td>\n",
       "      <td>45:0.230584592298|608:0.106800252251|875:0.855...</td>\n",
       "      <td>1:0.490872919869|3:0.424561179125|8:0.99783837...</td>\n",
       "      <td>10:0.597465357038|7:0.643510796079|11:0.243224...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>0.481062</td>\n",
       "      <td>291:0.881112947611|388:0.697094624091|763:0.22...</td>\n",
       "      <td>9:0.781165342323|1:0.511852234998</td>\n",
       "      <td>14:0.408191500027|9:0.498295079475</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>0.436182</td>\n",
       "      <td>46:0.940124063769|554:0.767308958233|861:0.855...</td>\n",
       "      <td>9:0.875484521907|3:0.106273907166|9:0.84651772...</td>\n",
       "      <td>8:0.149560752738</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>0.482011</td>\n",
       "      <td>220:0.478327795879|337:0.819628333503|848:0.22...</td>\n",
       "      <td>2:0.0910865169432</td>\n",
       "      <td>14:0.678797084474</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>0.438655</td>\n",
       "      <td>289:0.644933508368|496:0.522363900655|756:0.89...</td>\n",
       "      <td>2:0.396165025216|6:0.141882798936</td>\n",
       "      <td>14:0.176257252267|8:0.72011557031</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>0.453735</td>\n",
       "      <td>280:0.429478831957|649:0.0741628134484|994:0.1...</td>\n",
       "      <td>1:0.396894170838|5:0.469937185599|8:0.08335340...</td>\n",
       "      <td>9:0.0835741235123</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>0.442530</td>\n",
       "      <td>73:0.633517359822|685:0.850950435963|742:0.365...</td>\n",
       "      <td>4:0.625446694804|3:0.774047123178|9:0.13687904...</td>\n",
       "      <td>13:0.0103050329416|14:0.571122119363|1:0.59162...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>0.503040</td>\n",
       "      <td>259:0.452990065406|412:0.438894049519|707:0.13...</td>\n",
       "      <td>9:0.283803173315|6:0.988080833073|9:0.33548057...</td>\n",
       "      <td>12:0.990657592627|4:0.894966012858|6:0.3458076...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>0.489070</td>\n",
       "      <td>189:0.515408362614|583:0.847338890072|933:0.74...</td>\n",
       "      <td>8:0.661055519968|2:0.0869554540502</td>\n",
       "      <td>11:0.108245453964</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>0.452601</td>\n",
       "      <td>267:0.2764957277|540:0.862721062732|871:0.4012...</td>\n",
       "      <td>4:0.892649882639|8:0.309380654102|2:0.46675490...</td>\n",
       "      <td>9:0.943323635227|9:0.557961119858</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>225</th>\n",
       "      <td>0.450036</td>\n",
       "      <td>164:0.48606606281|524:0.0664395790728|827:0.20...</td>\n",
       "      <td>9:0.408267990501</td>\n",
       "      <td>1:0.46199767607</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>226</th>\n",
       "      <td>0.452397</td>\n",
       "      <td>184:0.811857124518|601:0.501544244258|868:0.30...</td>\n",
       "      <td>1:0.990090617662|2:0.0508944627246|7:0.5966133...</td>\n",
       "      <td>10:0.978972651497|3:0.209383502072|8:0.7888135...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>227</th>\n",
       "      <td>0.459951</td>\n",
       "      <td>234:0.410493593719|448:0.370599772678|914:0.74...</td>\n",
       "      <td>7:0.787327194085|9:0.0625955677636|7:0.7644850...</td>\n",
       "      <td>2:0.479999702701</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>228</th>\n",
       "      <td>0.481374</td>\n",
       "      <td>39:0.224828991629|632:0.813172756996|968:0.441...</td>\n",
       "      <td>2:0.67455819547|7:0.573889925555</td>\n",
       "      <td>4:0.660022642555|10:0.275652845599</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>229</th>\n",
       "      <td>0.509814</td>\n",
       "      <td>125:0.0566587761817|324:0.930387856503|930:0.0...</td>\n",
       "      <td>3:0.0500238943987|1:0.370602806628</td>\n",
       "      <td>12:0.326307288436|10:0.700453949854|8:0.947169...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>230</th>\n",
       "      <td>0.435772</td>\n",
       "      <td>233:0.579346536256|357:0.343210782242|920:0.25...</td>\n",
       "      <td>7:0.515646978022</td>\n",
       "      <td>1:0.394630086024|13:0.529025422622</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>231</th>\n",
       "      <td>0.434688</td>\n",
       "      <td>91:0.637504206647|683:0.08603248324|937:0.7317...</td>\n",
       "      <td>7:0.276502075804|1:0.90511962457|3:0.699301592988</td>\n",
       "      <td>2:0.376408395505|2:0.706855438455</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>232</th>\n",
       "      <td>0.447485</td>\n",
       "      <td>244:0.86862797034|617:0.710922519221|896:0.504...</td>\n",
       "      <td>2:0.336711418242|8:0.0518862568892|5:0.7540033...</td>\n",
       "      <td>3:0.726700247475|2:0.461699827266|14:0.6504424...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>233</th>\n",
       "      <td>0.461041</td>\n",
       "      <td>69:0.0460119744957|505:0.430919668041|790:0.17...</td>\n",
       "      <td>1:0.839219544275</td>\n",
       "      <td>4:0.58830948779|8:0.291070473629</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>234</th>\n",
       "      <td>0.460970</td>\n",
       "      <td>127:0.164357746569|495:0.774153499762|991:0.74...</td>\n",
       "      <td>8:0.473387354955|9:0.472697453821|1:0.13522116...</td>\n",
       "      <td>13:0.138123222229</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>235</th>\n",
       "      <td>0.434069</td>\n",
       "      <td>132:0.758173614244|694:0.954909467072|911:0.92...</td>\n",
       "      <td>1:0.379712201676|3:0.862885557234</td>\n",
       "      <td>1:0.334967286474|6:0.893004395719|11:0.0087466...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>236</th>\n",
       "      <td>0.436683</td>\n",
       "      <td>205:0.355816944511|397:0.442471729102|885:0.80...</td>\n",
       "      <td>6:0.231459272107|2:0.930483555941</td>\n",
       "      <td>6:0.407849624731</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>237</th>\n",
       "      <td>0.388894</td>\n",
       "      <td>209:0.0655532169628|354:0.616598372639|809:0.8...</td>\n",
       "      <td>1:0.535007909584|4:0.543682239263</td>\n",
       "      <td>8:0.183959696714|13:0.712212525394</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>238</th>\n",
       "      <td>0.507626</td>\n",
       "      <td>15:0.277881127767|499:0.485317663447|971:0.483...</td>\n",
       "      <td>9:0.664618080331|6:0.384678136242|4:0.72219667...</td>\n",
       "      <td>3:0.0753881426679|1:0.866104367784|8:0.7197700...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>239</th>\n",
       "      <td>0.480848</td>\n",
       "      <td>73:0.955428918106|389:0.200916294324|985:0.064...</td>\n",
       "      <td>7:0.589251667164|2:0.0393569279887|4:0.7021002...</td>\n",
       "      <td>2:0.417333771496|12:0.234329690039|11:0.269321...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>240</th>\n",
       "      <td>0.451609</td>\n",
       "      <td>88:0.135110669463|553:0.605501820308|939:0.598...</td>\n",
       "      <td>8:0.764873072288</td>\n",
       "      <td>12:0.217304469367|9:0.990745754175|6:0.3723820...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>241</th>\n",
       "      <td>0.442800</td>\n",
       "      <td>285:0.269259576737|618:0.712146864369|943:0.63...</td>\n",
       "      <td>1:0.672745193685|2:0.287418751747</td>\n",
       "      <td>9:0.218212746376|7:0.344106886949|4:0.40220419...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>242</th>\n",
       "      <td>0.488033</td>\n",
       "      <td>128:0.418997868081|403:0.230868011018|811:0.03...</td>\n",
       "      <td>8:0.1985029077</td>\n",
       "      <td>3:0.465511944834|12:0.885322027896|7:0.3414949...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>243</th>\n",
       "      <td>0.490599</td>\n",
       "      <td>263:0.98111904057|379:0.977103464284|794:0.386...</td>\n",
       "      <td>9:0.728230889944|3:0.499358604539</td>\n",
       "      <td>1:0.238674639681|13:0.285487791641|14:0.327618...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>244</th>\n",
       "      <td>0.421432</td>\n",
       "      <td>170:0.637435689776|462:0.388204875153|849:0.00...</td>\n",
       "      <td>2:0.393617462567</td>\n",
       "      <td>9:0.767674382034</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>245</th>\n",
       "      <td>0.457375</td>\n",
       "      <td>214:0.575873417893|551:0.0834704970276|778:0.8...</td>\n",
       "      <td>6:0.218998483555|3:0.190376669588</td>\n",
       "      <td>9:0.311822075071|6:0.228856040316</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>246</th>\n",
       "      <td>0.481183</td>\n",
       "      <td>286:0.551291674738|470:0.460358226614|742:0.03...</td>\n",
       "      <td>7:0.391360517581</td>\n",
       "      <td>8:0.615597506946|13:0.351011596302|3:0.8067591...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247</th>\n",
       "      <td>0.436684</td>\n",
       "      <td>84:0.844108078692|466:0.113414173787|856:0.213...</td>\n",
       "      <td>5:0.654977564405|2:0.396221156739</td>\n",
       "      <td>1:0.960945622393|9:0.711878079503|2:0.28017937...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>248</th>\n",
       "      <td>0.489178</td>\n",
       "      <td>228:0.650889380535|412:0.703263233734|817:0.22...</td>\n",
       "      <td>9:0.966998345727|7:0.70175663059|8:0.689030269113</td>\n",
       "      <td>8:0.944684496518|6:0.70980737384|12:0.31650760538</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>249</th>\n",
       "      <td>0.484398</td>\n",
       "      <td>224:0.434757311537|436:0.184268567262|724:0.09...</td>\n",
       "      <td>3:0.230517932804|6:0.820387636581</td>\n",
       "      <td>5:0.709518474664|8:0.614046146318|2:0.94210871916</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>250</th>\n",
       "      <td>0.475585</td>\n",
       "      <td>19:0.786168915933|509:0.983596879903|703:0.680...</td>\n",
       "      <td>4:0.853539905342|9:0.71589946311|1:0.863962079645</td>\n",
       "      <td>10:0.836096433884|2:0.517806532664|6:0.3798552...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>251</th>\n",
       "      <td>0.489042</td>\n",
       "      <td>259:0.326885787725|418:0.834644989373|835:0.46...</td>\n",
       "      <td>1:0.204952158966|1:0.48422919874</td>\n",
       "      <td>8:0.317274233788|7:0.369718174861</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>252</th>\n",
       "      <td>0.454357</td>\n",
       "      <td>41:0.672575824614|536:0.300162592117|751:0.897...</td>\n",
       "      <td>2:0.267648584056|8:0.513560090159|4:0.35904084...</td>\n",
       "      <td>5:0.289564700208|13:0.0231099166008|3:0.647822...</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>253</th>\n",
       "      <td>0.475810</td>\n",
       "      <td>252:0.968384534048|604:0.125839266577|844:0.43...</td>\n",
       "      <td>8:0.172312273857|4:0.305775663587|5:0.63092205...</td>\n",
       "      <td>2:0.0804582337542|11:0.255994952622|13:0.71835...</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>254</th>\n",
       "      <td>0.492914</td>\n",
       "      <td>213:0.214543696623|530:0.970990244603|972:0.08...</td>\n",
       "      <td>9:0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>255 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                                                  1  \\\n",
       "0    0.476935  171:0.160769187967|558:0.762985057103|987:0.46...   \n",
       "1    0.502716  125:0.631063165963|424:0.610401693013|982:0.30...   \n",
       "2    0.455586  226:0.590334677243|601:0.353207570678|972:0.87...   \n",
       "3    0.451006  187:0.167603031484|338:0.0603367158994|790:0.5...   \n",
       "4    0.483660  239:0.749947232634|630:0.832378526229|847:0.49...   \n",
       "5    0.479454  203:0.222651807156|481:0.567848503189|731:0.77...   \n",
       "6    0.431707  59:0.855379126162|513:0.141471919663|783:0.409...   \n",
       "7    0.462653  20:0.131093229566|677:0.560840054375|908:0.737...   \n",
       "8    0.487114  172:0.766779876387|584:0.66569748063|989:0.355...   \n",
       "9    0.469684  37:0.216288069064|361:0.270378044031|813:0.376...   \n",
       "10   0.504351  105:0.926029460978|300:0.393425065771|743:0.30...   \n",
       "11   0.462543  132:0.212923272863|512:0.929013588064|767:0.14...   \n",
       "12   0.484072  219:0.881813730397|389:0.561590207114|941:0.82...   \n",
       "13   0.497101  236:0.431104177581|457:0.537189291023|883:0.73...   \n",
       "14   0.494491  18:0.423662145993|432:0.0376770741931|804:0.52...   \n",
       "15   0.483268  226:0.126308385983|619:0.172374653605|948:0.70...   \n",
       "16   0.498694  95:0.00745487544264|474:0.487908071243|932:0.2...   \n",
       "17   0.482705  182:0.0485782591081|568:0.322526924582|863:0.1...   \n",
       "18   0.472208  155:0.22277532867|497:0.842596723995|805:0.524...   \n",
       "19   0.431964  12:0.274281719682|302:0.713962078485|972:0.671...   \n",
       "20   0.414114  45:0.230584592298|608:0.106800252251|875:0.855...   \n",
       "21   0.481062  291:0.881112947611|388:0.697094624091|763:0.22...   \n",
       "22   0.436182  46:0.940124063769|554:0.767308958233|861:0.855...   \n",
       "23   0.482011  220:0.478327795879|337:0.819628333503|848:0.22...   \n",
       "24   0.438655  289:0.644933508368|496:0.522363900655|756:0.89...   \n",
       "25   0.453735  280:0.429478831957|649:0.0741628134484|994:0.1...   \n",
       "26   0.442530  73:0.633517359822|685:0.850950435963|742:0.365...   \n",
       "27   0.503040  259:0.452990065406|412:0.438894049519|707:0.13...   \n",
       "28   0.489070  189:0.515408362614|583:0.847338890072|933:0.74...   \n",
       "29   0.452601  267:0.2764957277|540:0.862721062732|871:0.4012...   \n",
       "..        ...                                                ...   \n",
       "225  0.450036  164:0.48606606281|524:0.0664395790728|827:0.20...   \n",
       "226  0.452397  184:0.811857124518|601:0.501544244258|868:0.30...   \n",
       "227  0.459951  234:0.410493593719|448:0.370599772678|914:0.74...   \n",
       "228  0.481374  39:0.224828991629|632:0.813172756996|968:0.441...   \n",
       "229  0.509814  125:0.0566587761817|324:0.930387856503|930:0.0...   \n",
       "230  0.435772  233:0.579346536256|357:0.343210782242|920:0.25...   \n",
       "231  0.434688  91:0.637504206647|683:0.08603248324|937:0.7317...   \n",
       "232  0.447485  244:0.86862797034|617:0.710922519221|896:0.504...   \n",
       "233  0.461041  69:0.0460119744957|505:0.430919668041|790:0.17...   \n",
       "234  0.460970  127:0.164357746569|495:0.774153499762|991:0.74...   \n",
       "235  0.434069  132:0.758173614244|694:0.954909467072|911:0.92...   \n",
       "236  0.436683  205:0.355816944511|397:0.442471729102|885:0.80...   \n",
       "237  0.388894  209:0.0655532169628|354:0.616598372639|809:0.8...   \n",
       "238  0.507626  15:0.277881127767|499:0.485317663447|971:0.483...   \n",
       "239  0.480848  73:0.955428918106|389:0.200916294324|985:0.064...   \n",
       "240  0.451609  88:0.135110669463|553:0.605501820308|939:0.598...   \n",
       "241  0.442800  285:0.269259576737|618:0.712146864369|943:0.63...   \n",
       "242  0.488033  128:0.418997868081|403:0.230868011018|811:0.03...   \n",
       "243  0.490599  263:0.98111904057|379:0.977103464284|794:0.386...   \n",
       "244  0.421432  170:0.637435689776|462:0.388204875153|849:0.00...   \n",
       "245  0.457375  214:0.575873417893|551:0.0834704970276|778:0.8...   \n",
       "246  0.481183  286:0.551291674738|470:0.460358226614|742:0.03...   \n",
       "247  0.436684  84:0.844108078692|466:0.113414173787|856:0.213...   \n",
       "248  0.489178  228:0.650889380535|412:0.703263233734|817:0.22...   \n",
       "249  0.484398  224:0.434757311537|436:0.184268567262|724:0.09...   \n",
       "250  0.475585  19:0.786168915933|509:0.983596879903|703:0.680...   \n",
       "251  0.489042  259:0.326885787725|418:0.834644989373|835:0.46...   \n",
       "252  0.454357  41:0.672575824614|536:0.300162592117|751:0.897...   \n",
       "253  0.475810  252:0.968384534048|604:0.125839266577|844:0.43...   \n",
       "254  0.492914  213:0.214543696623|530:0.970990244603|972:0.08...   \n",
       "\n",
       "                                                     2  \\\n",
       "0                    7:0.551860799509|2:0.516122332872   \n",
       "1    6:0.850742992565|5:0.452926581219|4:0.74618124...   \n",
       "2                    1:0.899031815365|3:0.287916015904   \n",
       "3                                     7:0.325278910648   \n",
       "4                    9:0.501711706914|5:0.575738108738   \n",
       "5                                     5:0.285600906477   \n",
       "6                                     6:0.817890670226   \n",
       "7                                     4:0.709533662447   \n",
       "8                                     6:0.114119799904   \n",
       "9                     5:0.15777133173|4:0.139784393334   \n",
       "10                   9:0.669337345955|7:0.441819620127   \n",
       "11                    6:0.110938869816|8:0.54415660918   \n",
       "12   3:0.326593454924|6:0.985041938454|8:0.85089269...   \n",
       "13   2:0.860980822171|4:0.835194645965|7:0.94994479...   \n",
       "14                                    4:0.612642228162   \n",
       "15                   1:0.736998750902|5:0.865735323506   \n",
       "16                                    1:0.562970994328   \n",
       "17   1:0.203530688302|6:0.708283835359|5:0.11347919...   \n",
       "18    9:0.46877523338|6:0.578300316531|7:0.69566161426   \n",
       "19                    1:0.64778739726|9:0.923251278452   \n",
       "20   1:0.490872919869|3:0.424561179125|8:0.99783837...   \n",
       "21                   9:0.781165342323|1:0.511852234998   \n",
       "22   9:0.875484521907|3:0.106273907166|9:0.84651772...   \n",
       "23                                   2:0.0910865169432   \n",
       "24                   2:0.396165025216|6:0.141882798936   \n",
       "25   1:0.396894170838|5:0.469937185599|8:0.08335340...   \n",
       "26   4:0.625446694804|3:0.774047123178|9:0.13687904...   \n",
       "27   9:0.283803173315|6:0.988080833073|9:0.33548057...   \n",
       "28                  8:0.661055519968|2:0.0869554540502   \n",
       "29   4:0.892649882639|8:0.309380654102|2:0.46675490...   \n",
       "..                                                 ...   \n",
       "225                                   9:0.408267990501   \n",
       "226  1:0.990090617662|2:0.0508944627246|7:0.5966133...   \n",
       "227  7:0.787327194085|9:0.0625955677636|7:0.7644850...   \n",
       "228                   2:0.67455819547|7:0.573889925555   \n",
       "229                 3:0.0500238943987|1:0.370602806628   \n",
       "230                                   7:0.515646978022   \n",
       "231  7:0.276502075804|1:0.90511962457|3:0.699301592988   \n",
       "232  2:0.336711418242|8:0.0518862568892|5:0.7540033...   \n",
       "233                                   1:0.839219544275   \n",
       "234  8:0.473387354955|9:0.472697453821|1:0.13522116...   \n",
       "235                  1:0.379712201676|3:0.862885557234   \n",
       "236                  6:0.231459272107|2:0.930483555941   \n",
       "237                  1:0.535007909584|4:0.543682239263   \n",
       "238  9:0.664618080331|6:0.384678136242|4:0.72219667...   \n",
       "239  7:0.589251667164|2:0.0393569279887|4:0.7021002...   \n",
       "240                                   8:0.764873072288   \n",
       "241                  1:0.672745193685|2:0.287418751747   \n",
       "242                                     8:0.1985029077   \n",
       "243                  9:0.728230889944|3:0.499358604539   \n",
       "244                                   2:0.393617462567   \n",
       "245                  6:0.218998483555|3:0.190376669588   \n",
       "246                                   7:0.391360517581   \n",
       "247                  5:0.654977564405|2:0.396221156739   \n",
       "248  9:0.966998345727|7:0.70175663059|8:0.689030269113   \n",
       "249                  3:0.230517932804|6:0.820387636581   \n",
       "250  4:0.853539905342|9:0.71589946311|1:0.863962079645   \n",
       "251                   1:0.204952158966|1:0.48422919874   \n",
       "252  2:0.267648584056|8:0.513560090159|4:0.35904084...   \n",
       "253  8:0.172312273857|4:0.305775663587|5:0.63092205...   \n",
       "254                                                9:0   \n",
       "\n",
       "                                                     3    4  \n",
       "0                   3:0.578531033382|10:0.912359976594  1.0  \n",
       "1                    5:0.733476495488|4:0.746721584349  0.0  \n",
       "2                    6:0.792869990048|1:0.806770056389  1.0  \n",
       "3                                    14:0.214951601542  0.0  \n",
       "4                   3:0.466092055001|13:0.709491003408  0.0  \n",
       "5    11:0.493049264151|2:0.158095697727|11:0.754516...  1.0  \n",
       "6    4:0.364886715962|14:0.784435436416|10:0.043269...  0.0  \n",
       "7                                    10:0.974380757476  1.0  \n",
       "8                                     2:0.508507829738  1.0  \n",
       "9                                     6:0.494135082626  1.0  \n",
       "10   7:0.485383148462|3:0.125428390011|10:0.2036683...  1.0  \n",
       "11                                   13:0.668113977213  0.0  \n",
       "12                                    2:0.513292357115  1.0  \n",
       "13   5:0.305789281449|1:0.0754994554699|12:0.892406...  1.0  \n",
       "14                 12:0.474166319997|12:0.234704205978  1.0  \n",
       "15   13:0.538441443708|12:0.898462163417|11:0.54271...  1.0  \n",
       "16                  5:0.190344207776|13:0.485808245185  0.0  \n",
       "17                                   12:0.780980740861  1.0  \n",
       "18                 12:0.0187166419041|7:0.892443192656  0.0  \n",
       "19                                    7:0.964178415721  1.0  \n",
       "20   10:0.597465357038|7:0.643510796079|11:0.243224...  0.0  \n",
       "21                  14:0.408191500027|9:0.498295079475  1.0  \n",
       "22                                    8:0.149560752738  0.0  \n",
       "23                                   14:0.678797084474  0.0  \n",
       "24                   14:0.176257252267|8:0.72011557031  1.0  \n",
       "25                                   9:0.0835741235123  1.0  \n",
       "26   13:0.0103050329416|14:0.571122119363|1:0.59162...  0.0  \n",
       "27   12:0.990657592627|4:0.894966012858|6:0.3458076...  1.0  \n",
       "28                                   11:0.108245453964  0.0  \n",
       "29                   9:0.943323635227|9:0.557961119858  1.0  \n",
       "..                                                 ...  ...  \n",
       "225                                    1:0.46199767607  1.0  \n",
       "226  10:0.978972651497|3:0.209383502072|8:0.7888135...  1.0  \n",
       "227                                   2:0.479999702701  1.0  \n",
       "228                 4:0.660022642555|10:0.275652845599  1.0  \n",
       "229  12:0.326307288436|10:0.700453949854|8:0.947169...  0.0  \n",
       "230                 1:0.394630086024|13:0.529025422622  0.0  \n",
       "231                  2:0.376408395505|2:0.706855438455  1.0  \n",
       "232  3:0.726700247475|2:0.461699827266|14:0.6504424...  0.0  \n",
       "233                   4:0.58830948779|8:0.291070473629  1.0  \n",
       "234                                  13:0.138123222229  1.0  \n",
       "235  1:0.334967286474|6:0.893004395719|11:0.0087466...  1.0  \n",
       "236                                   6:0.407849624731  0.0  \n",
       "237                 8:0.183959696714|13:0.712212525394  0.0  \n",
       "238  3:0.0753881426679|1:0.866104367784|8:0.7197700...  0.0  \n",
       "239  2:0.417333771496|12:0.234329690039|11:0.269321...  0.0  \n",
       "240  12:0.217304469367|9:0.990745754175|6:0.3723820...  1.0  \n",
       "241  9:0.218212746376|7:0.344106886949|4:0.40220419...  0.0  \n",
       "242  3:0.465511944834|12:0.885322027896|7:0.3414949...  1.0  \n",
       "243  1:0.238674639681|13:0.285487791641|14:0.327618...  0.0  \n",
       "244                                   9:0.767674382034  1.0  \n",
       "245                  9:0.311822075071|6:0.228856040316  0.0  \n",
       "246  8:0.615597506946|13:0.351011596302|3:0.8067591...  0.0  \n",
       "247  1:0.960945622393|9:0.711878079503|2:0.28017937...  1.0  \n",
       "248  8:0.944684496518|6:0.70980737384|12:0.31650760538  0.0  \n",
       "249  5:0.709518474664|8:0.614046146318|2:0.94210871916  0.0  \n",
       "250  10:0.836096433884|2:0.517806532664|6:0.3798552...  0.0  \n",
       "251                  8:0.317274233788|7:0.369718174861  1.0  \n",
       "252  5:0.289564700208|13:0.0231099166008|3:0.647822...  1.0  \n",
       "253  2:0.0804582337542|11:0.255994952622|13:0.71835...  0.0  \n",
       "254                                                NaN  NaN  \n",
       "\n",
       "[255 rows x 5 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_csv(\"./result.csv\",header=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上例中第一行为模型打分，后续为特征原样输出"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型服务\n",
    "\n",
    "本实现在模型正常训练完成后，会导出相应save model模型，存储在checkpoint dir目录中，如果是在pai studio 中训练，可在oss checkpoint dir路径下找到，模型可直接在tf serving或PAI EAS在线服务中使用，导出signature为："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "inputs[\"features\"]  = all kv features\n",
    "inputs[<mulit tags coluname 1>] = multi tags kv feture\n",
    "inputs[<mulit tags coluname 2>] = multi tags kv feture\n",
    "...\n",
    "...\n",
    "inputs[<mulit tags coluname n>] = multi tags kv feture\n",
    "outputs[\"score\"] = socre\n",
    "```\n",
    "其中**features**和**scores**为固定的名称,对于multi tags类型名称以训练时输入的列名确定，且每个multi tags输入为单独输入，在线预测是为batch预测，当仅有一行样本需要预测是，需确保构造为batch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PAI studio（Maxcompute）\n",
    "## 使用说明\n",
    "\n",
    "本实现已经做了平台兼容，可以将DSW中的训练无缝的迁移至PAI studio（Maxompute）上进行大规模分布式生产作业。\n",
    "此时程序入口可以使用本实现中提供的entry.py,将程序代码打包上传并准备好数据后（平台接受数据见上文），即可进行训练。\n",
    "以Maxcompute前端工具Dataworks或odps cmdline提交为例：\n",
    "\n",
    "```bash\n",
    "pai -name tensorflow1120_ext -project algo_public\n",
    "-Dscript='[odps|oss]://<your project>/resources/<deepfm.tgz>'  #程序文件包\n",
    "-DentryFile='entry.py'\n",
    "-Dcluster='{\\\"ps\\\":{\\\"count\\\":3,\\\"cpu\\\":800,\\\"memory\\\":16000},\\\"worker\\\": {\\\"count\\\":20,\\\"cpu\\\":800,\\\"memory\\\":4000}}'  #集群参数\n",
    "# maxcompute 或oss 数据，此处以maxcompute表为例，但需注意此处只是资源声明，模型参数输入见DuserDefinedParameters\n",
    "-Dtables='odps://<your project>/tables/<input table name>'   \n",
    "-DcheckpointDir='oss://<your oss bucket>.oss-cn-<region name>-internal.aliyuncs.com/<your dir>'  \n",
    "-Darn='<your arn>'\n",
    "# 同tables参数一致，也为资源声明，为输出\n",
    "-Doutputs='odps://<your project>/tables/<output table name>'\n",
    "# 模型参数\n",
    "-DuserDefinedParameters='--mode=train --embedding_size=16 --kv_map=[[1],[1-100],[2],[100-1000]] ....'\n",
    ";\n",
    "```\n",
    "\n",
    "以上提交需要注意一点为：在maxcompute上模型并不会创建表，需要用户手动创建，同时不会进行列名搜索对齐，创建表schema仅需在特征表第一行增加score列，其他按照输入特征表schema格式创建即可"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 高级用法\n",
    "## 增量训练\n",
    "\n",
    "本实现一定程度上支持了增量训练，增量主要是通过预留kv中key的空间来实现的，可以预先留一部分key，在后续训练中增加相关key短期内无需重复大规模训练\n",
    "\n",
    "## FM && Deep\n",
    "\n",
    "本实保留了原作者实现是的use_fm和use deep，通过设定该参数可以将模型退化为纯FM或纯DNN\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 引用\n",
    "[1] *DeepFM: A Factorization-Machine based Neural Network for CTR Prediction*, Huifeng Guo, Ruiming Tang, Yunming Yey, Zhenguo Li, Xiuqiang He."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 致谢\n",
    "- [ ] He Xiangnan\n",
    "- [ ] Chen Chenglong \n",
    "\n",
    "# 协议\n",
    "MIT\n",
    "\n",
    "# 常见问题\n",
    "- feature max size 设置小于实际kv值，造成embedding lookup报错\n",
    "- mode train和predict设置错误，造成部分op找不到\n",
    "- kv数据内每个槽位内有多个key造成冲突，embedding lookup位置返回-1报错\n",
    "- 数据内分隔符使用错误造成，提示split错误\n",
    "- 数据内存在非实数造成类型转换失败\n",
    "- checkpoint未删除重复训练，造成模型加载和图不匹配错误"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
